Model-based methods to identify multiple cluster structures in a data set

نویسندگان

  • Giuliano Galimberti
  • Gabriele Soffritti
چکیده

There is an interest in the problem of identifying different partitions of a given set of units obtained according to different subsets of the observed variables (multiple cluster structures). A model-based procedure has been previously developed for detecting multiple cluster structures from independent subsets of variables. The method relies on model-based clustering methods and on a comparison among mixture models using the Bayesian Information Criterion. A generalization of this method which allows the use of any model-selection criterion is considered.A new approach combining the generalized model-based procedure with variable-clustering methods is proposed. The usefulness of the new method is shown using simulated and real examples. Monte Carlo methods are employed to evaluate the performance of various approaches. Data matrices with two cluster structures are analyzed taking into account the separation of clusters, the heterogeneity within clusters and the dependence of cluster structures. © 2007 Elsevier B.V. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Measuring a Dynamic Efficiency Based on MONLP Model under DEA Control

Data envelopment analysis (DEA) is a common technique in measuring the relative efficiency of a set of decision making units (DMUs) with multiple inputs and multiple outputs. ‎‎Standard DEA models are ‎‎quite limited models‎, ‎in the sense that they do not consider a DMU ‎‎at different times‎. ‎To resolve this problem‎, ‎DEA models with dynamic ‎‎structures have been proposed‎.‎In a recent pape...

متن کامل

Ranking and Clustering Iranian Provinces Based on COVID-19 Spread: K-Means Cluster Analysis

Introduction: The Coronavirus has crossed geographical borders. This study was performed to rank and cluster Iranian provinces based on coronavirus disease (COVID-19) recorded cases from February 19 to March 22, 2020. Materials and Methods: This cross-sectional study was conducted in 31 provinces of Iran using the daily number of confirmed cases. Cumulative Frequency (CF) and Adjusted CF (ACF)...

متن کامل

خوشه‌بندی داده‌ها بر پایه شناسایی کلید

Clustering has been one of the main building blocks in the fields of machine learning and computer vision. Given a pair-wise distance measure, it is challenging to find a proper way to identify a subset of representative exemplars and its associated cluster structures. Recent trend on big data analysis poses a more demanding requirement on new clustering algorithm to be both scalable and accura...

متن کامل

Quantitative Modeling for Prediction of Critical Temperature of Refrigerant Compounds

The quantitative structure-property relationship (QSPR) method is used to develop the correlation between structures of refrigerants (198 compounds) and their critical temperature. Molecular descriptors calculated from structure alone were used to represent molecular structures. A subset of the calculated descriptors selected using a genetic algorithm (GA) was used in the QSPR model development...

متن کامل

Interpolating time series based on fuzzy cluster analysis problem

This study proposes the model for interpolating time series to use them  to forecast effectively for future. This model is established based on the improved fuzzy clustering analysis problem, which is implemented by the Matlab procedure. The proposed model is illustrated by a data set and tested for many other datasets, especially for 3003 series in  M3-Competition data. Comparing  to the exist...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Computational Statistics & Data Analysis

دوره 52  شماره 

صفحات  -

تاریخ انتشار 2007